There is the data: atp_tennis containing the following columns: ['Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Player_1', 'Player_2', 'Winner', 'Rank_1', 'Rank_2', 'Pts_1', 'Pts_2', 'Odd_1', 'Odd_2', 'score'].  
--- The description for each column this data is:
Tournament: Name of the tennis tournament (Brisbane International, Chennai Open, Qatar Exxon Mobil Open ...etc)
Date: Date the match was played (year-month-day)
Series: Category or level of the tennis tournament (ATP250, ATP500, Masters1000 and Grand Slams offer 250, 500, 1000, and 2000 ranking points to the winner seperately.)
Court: Place the match was held (Indoors or Outdoors)
Surface: Type of court surface (Hard, Grass and Clay)
Round: Stage of the tournament (1st Round, 2nd Round, Quarterfinals, Semifinal and The Final)
Best of: Tourament systems ("best of 3" or "best of 5")
Player_1: Name of the first competitor in each match
Player_2: Name of the second competitor in each match
Winner: Name of the Player_1 or Player_2 who won the match 
Rank_1: World rankings of the Player_1 at the time of the match
Rank_2: World rankings of the Player_2 at the time of the match
Pts_1: Ranking points held by Player_1 before the match (accumulated over the season and higher points typically indicate a more successful player)
Pts_2: Ranking points held by Player_2 before the match
Odd_1: Betting odds for Player_1 (reflecting their perceived chances of winning)
Odd_2: Betting odds for Player_2 
Score: Final score of the match ("score of Player_1" - "score of Player_2")
---
Considering contents from the dataset and requirements from user. Please note DO NOT CHANGE FILE AND VARIABLE NAMES THAT I HAVE SET!
Interactions begin:
[USER (data scientist)]: Can you help me find the matches in the atp_tennis dataset where Roger Federer is either Player_1 or Player_2? I would like to have a list of Match IDs for these matches. Please generate and display a list of match indices involving the player 'Federer R.' in the ATP tennis dataset.

[YOU (AI assistant)]: Sure, I can help you with that. To find the matches where Roger Federer is either Player_1 or Player_2, you can use the following code :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

federer_matches = atp_tennis[(atp_tennis['Player_1'] == 'Federer R.') | (atp_tennis['Player_2'] == 'Federer R.')]
federer_match_ids = federer_matches.index.tolist()

print(federer_match_ids)

'''

[USER (data scientist)]: Hey, I have a dataset of ATP tennis matches in a CSV file called "atp_tennis.csv". I'd like to analyze the performance of a player named "Raonic M." against all opponents that Roger Federer has faced. Can you help me with that? 

[YOU (AI assistant)]: Sure! First, let's load the dataset using pandas. Do you have pandas installed? 

[USER (data scientist)]: Yes, I have pandas installed. 

[YOU (AI assistant)]: Great! Let's start by importing pandas and loading the dataset: 
'''
import pandas as pd

# Load the dataset 
atp_tennis = pd.read_csv("atp_tennis.csv") 
'''
[USER (data scientist)]: Awesome! Now, can you find all the matches where Federer has played? Please generate and display a list of unique opponents who have played against 'Federer R.' in the ATP tennis dataset.

[YOU (AI assistant)]: Sure, let's filter the dataset to find all the matches where Federer is either Player_1 or Player_2 :
'''
import pandas as pd
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

federer_matches = atp_tennis[(atp_tennis['Player_1'] == 'Federer R.') | (atp_tennis['Player_2'] == 'Federer R.')]   
federer_opponents = federer_matches['Player_1'].where(federer_matches['Player_1'] != 'Federer R.', federer_matches['Player_2']).unique()   

print(federer_opponents)

'''

[USER (data scientist)]: Great! Now, can you calculate the surface performance of Raonic M. against each of Federer's opponents? Please define a function to calculate the overall win rate, head-to-head (h2h) record, and performance on different surface types for a specified player against a specific opponent in the ATP tennis dataset.

[YOU (AI assistant)]: Sure, let's modify the existing function to include surface performance calculation :
'''
import pandas as pd
import numpy as np
import pickle

atp_tennis = pd.read_csv('atp_tennis.csv')

def calculate_performance(player, opponent, data, surface_types):   
    player_matches = data[((data['Player_1'] == player) & (data['Player_2'] == opponent)) | ((data['Player_1'] == opponent) & (data['Player_2'] == player))]   
    total_matches = len(player_matches)   
    wins = len(player_matches[player_matches['Winner'] == player])   
    win_rate = wins / total_matches if total_matches > 0 else 0   
    h2h_record = wins / total_matches if total_matches > 0 else 0   

    surface_performance = {}   
    for surface in surface_types:   
        player_surface_matches = player_matches[player_matches['Surface'] == surface]   
        player_surface_wins = len(player_surface_matches[player_surface_matches['Winner'] == player])   
        surface_performance[surface] = player_surface_wins / player_surface_matches.shape[0] if player_surface_matches.shape[0] > 0 else 0   

    return win_rate, h2h_record, surface_performance   

'''

[USER (data scientist)]: Great job on the function! Next, we'll create a dataframe to hold the results. We'll then loop through Federer's opponents to determine Raonic M.'s performance. Additionally, could you answer this question: On which surface does Raonic M. maintain a 100% win rate against Davydenko N.?

A. Clay surface
B. Grass surface
C. hard surface 
D. Carpet surface
E. None of above

[YOU (AI assistant)]: